Skip to content

Conversation

@Soulter
Copy link
Member

@Soulter Soulter commented Dec 4, 2025

  • Added MySQL database implementation with connection settings and session management.
  • Introduced a new AstrBotMySQLSettings class for configuration.
  • Updated database helper functions to support both SQLite and MySQL.
  • Enhanced platform statistics retrieval with time series data for both database types.
  • Refactored existing SQLite methods to align with new database structure and functionality.

closes: #3848

Modifications / 改动点

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果


Checklist / 检查清单

  • 😊 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。/ If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
  • 👀 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。/ My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
  • 🤓 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到了 requirements.txtpyproject.toml 文件相应位置。/ I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
  • 😮 我的更改没有引入恶意代码。/ My changes do not introduce malicious code.

Summary by Sourcery

在现有 SQLite 的基础上集成可配置的 MySQL 数据库支持,并扩展共享的数据库抽象层和统计处理逻辑,使其可在两种后端上通用。

New Features:

  • 添加 MySQLDatabase 实现,支持异步连接/会话管理,并完整支持会话(conversations)、角色配置(personas)、偏好设置(preferences)、会话(sessions)、附件(attachments)以及平台消息历史(platform message history)。
  • 引入 AstrBotMySQLSettings 和数据库工厂(database factory),通过环境变量配置在运行时选择 SQLite 或 MySQL。
  • 为两种数据库类型都提供时间序列平台统计 API,以支持仪表盘图表渲染。

Enhancements:

  • 统一平台统计聚合逻辑,在所有后端上使用汇总计数和分组查询,包括按平台汇总的总数以及按小时的时间序列数据。
  • 重构仪表盘统计路由以使用新的跨数据库统计 API,并返回简化且结构化的响应数据。
  • 扩展 BaseDatabase 抽象,增加数据库类型元数据和时间序列统计方法,并为 MySQL 后端禁用传统的 SQLite 迁移逻辑。
  • 调整 webchat 会话迁移逻辑,通过聚合来推导 sender_name,以兼容分组查询。

Build:

  • 添加 aiomysql 依赖以支持异步 MySQL。
Original summary in English

Summary by Sourcery

Integrate configurable MySQL database support alongside existing SQLite and extend shared database abstractions and stats handling to work across both backends.

New Features:

  • Add a MySQLDatabase implementation with async connection/session management and full support for conversations, personas, preferences, sessions, attachments, and platform message history.
  • Introduce AstrBotMySQLSettings and a database factory to select SQLite or MySQL at runtime via environment configuration.
  • Expose time‑series platform statistics APIs for both database types to support dashboard charting.

Enhancements:

  • Unify platform statistics aggregation to use summed counts and grouped queries across backends, including per-platform totals and hourly time‑series data.
  • Refactor dashboard statistics route to consume the new cross‑database stats APIs and return a simplified, structured payload.
  • Extend the BaseDatabase abstraction with database type metadata and time‑series stats methods, and disable legacy SQLite migration for MySQL backends.
  • Adjust webchat session migration to derive sender_name via aggregation for compatibility with grouped queries.

Build:

  • Add aiomysql dependency for async MySQL support.

- Added MySQL database implementation with connection settings and session management.
- Introduced a new AstrBotMySQLSettings class for configuration.
- Updated database helper functions to support both SQLite and MySQL.
- Enhanced platform statistics retrieval with time series data for both database types.
- Refactored existing SQLite methods to align with new database structure and functionality.

closes: #3848
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - 我已经审阅了你的修改,这里是一些反馈:

  • 新的平台统计聚合逻辑(get_platform_statsget_platform_stats_time_series)目前在 SQLite 和 MySQL 之间存在重复,差异仅在 SQL 方言上;建议将共有的聚合结构提取到一个共享的辅助函数里,以减少偏差并保持各数据库间行为一致。
  • mysql.py 中,NOT_GIVEN 被定义为一个 TypeVar,但在 update_persona 中被用作运行时的哨兵值;为了更清晰且更符合惯例,建议使用一个唯一的对象实例(例如 NOT_GIVEN = object())来承担这个角色。
  • 迁移辅助函数现在会在 DatabaseType.MYSQL 时直接短路,这意味着如果用户切换到 MySQL,则任何现有的 SQLite 数据都会被忽略;如果这是有意为之,建议在配置或代码注释中将该行为说明清楚,以避免对“迁移数据缺失”的困惑。
给 AI Agent 的提示词
Please address the comments from this code review:

## Overall Comments
- 新的平台统计聚合逻辑(`get_platform_stats``get_platform_stats_time_series`)目前在 SQLite 和 MySQL 之间存在重复,差异仅在 SQL 方言上;建议将共有的聚合结构提取到一个共享的辅助函数里,以减少偏差并保持各数据库间行为一致。
-`mysql.py` 中,`NOT_GIVEN` 被定义为一个 `TypeVar`,但在 `update_persona` 中被用作运行时的哨兵值;为了更清晰且更符合惯例,建议使用一个唯一的对象实例(例如 `NOT_GIVEN = object()`)来承担这个角色。
- 迁移辅助函数现在会在 `DatabaseType.MYSQL` 时直接短路,这意味着如果用户切换到 MySQL,则任何现有的 SQLite 数据都会被忽略;如果这是有意为之,建议在配置或代码注释中将该行为说明清楚,以避免对“迁移数据缺失”的困惑。

## Individual Comments

### Comment 1
<location> `astrbot/core/db/mysql.py:103-106` </location>
<code_context>
+        async with self.AsyncSessionLocal() as session:
+            yield session
+
+    async def initialize(self) -> None:
+        """Initialize the database by creating tables if they do not exist."""
+        async with self.engine.begin() as conn:
+            await conn.run_sync(SQLModel.metadata.create_all)
+            await conn.commit()
+
</code_context>

<issue_to_address>
**issue (bug_risk):** 避免在 `engine.begin()` 上下文中调用 `conn.commit()`;该上下文管理器已经处理事务的提交和回滚。

在 `async with self.engine.begin() as conn:` 中,SQLAlchemy 会开启一个事务,并在退出时自动提交或回滚。额外的 `await conn.commit()` 是不必要的,并且在某些配置下可能导致错误。请仅依赖上下文管理器的自动提交行为。
</issue_to_address>

### Comment 2
<location> `astrbot/core/db/mysql.py:840` </location>
<code_context>
+            result = await session.execute(query)
+            return list(result.scalars().all())
+
+    async def update_platform_session(
+        self,
+        session_id: str,
+        display_name: str | None = None,
+    ) -> None:
+        """Update a Platform session's updated_at timestamp and optionally display_name."""
+        async with self.get_db() as session:
+            session: AsyncSession
+            async with session.begin():
+                values: dict[str, T.Any] = {"updated_at": datetime.now(timezone.utc)}
+                if display_name is not None:
+                    values["display_name"] = display_name
+
+                await session.execute(
+                    update(PlatformSession)
+                    .where(col(PlatformSession.session_id) == session_id)
</code_context>

<issue_to_address>
**suggestion (bug_risk):** 请让 `updated_at` 的时间戳时区处理方式与模型的其他字段保持一致,以避免混用“有时区/无时区”的时间对象。

该模块中其他时间戳(例如统计信息和消息历史)使用的是无时区的 `datetime.now()`,因此这里是唯一一个使用 UTC 有时区时间的字段。根据 `PlatformSession.updated_at` 的定义方式不同,这种不一致可能会导致警告或比较问题。请在整个数据库层选择一种统一的约定(统一使用 UTC 有时区,或统一使用无时区),并一致地应用。

```suggestion
                values: dict[str, T.Any] = {"updated_at": datetime.now()}
```
</issue_to_address>

### Comment 3
<location> `astrbot/core/db/migration/migra_webchat_session.py:41` </location>
<code_context>
                 select(
                     col(PlatformMessageHistory.user_id),
-                    col(PlatformMessageHistory.sender_name),
+                    func.max(PlatformMessageHistory.sender_name).label("sender_name"),
                     func.min(PlatformMessageHistory.created_at).label("earliest"),
                     func.max(PlatformMessageHistory.updated_at).label("latest"),
</code_context>

<issue_to_address>
**question (bug_risk):** 使用 `MAX` 来聚合 `sender_name` 能解决 SQL 分组问题,但会产生一个任意的名字;请确认这是否符合预期语义。

`func.max` 可以满足 `ONLY_FULL_GROUP_BY` 的要求,但它会选择字典序中最大的那个名字,在一个分组内这基本上是任意的。如果被选中的 `sender_name` 在语义或 UI 上有意义,建议考虑基于特定行(例如最早消息或最新消息)来派生,而不是依赖 `MAX`。
</issue_to_address>

Sourcery 对开源项目是免费的——如果你喜欢我们的评审,请考虑分享它们 ✨
帮我变得更有用!请在每条评论上点击 👍 或 👎,我会根据你的反馈改进后续评审。
Original comment in English

Hey there - I've reviewed your changes - here's some feedback:

  • The new platform stats aggregation logic (get_platform_stats and get_platform_stats_time_series) is now duplicated between SQLite and MySQL with only SQL dialect differences; consider extracting the common aggregation shape to a shared helper to reduce drift and keep behavior aligned across databases.
  • In mysql.py, NOT_GIVEN is defined as a TypeVar but used as a runtime sentinel in update_persona; it would be clearer and more conventional to use a unique object instance (e.g. NOT_GIVEN = object()) for this purpose.
  • The migration helper now short-circuits for DatabaseType.MYSQL, meaning any existing SQLite data is ignored if a user switches to MySQL; if this is intentional, consider making the behavior explicit in configuration or code comments to avoid confusion about missing migrated data.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new platform stats aggregation logic (`get_platform_stats` and `get_platform_stats_time_series`) is now duplicated between SQLite and MySQL with only SQL dialect differences; consider extracting the common aggregation shape to a shared helper to reduce drift and keep behavior aligned across databases.
- In `mysql.py`, `NOT_GIVEN` is defined as a `TypeVar` but used as a runtime sentinel in `update_persona`; it would be clearer and more conventional to use a unique object instance (e.g. `NOT_GIVEN = object()`) for this purpose.
- The migration helper now short-circuits for `DatabaseType.MYSQL`, meaning any existing SQLite data is ignored if a user switches to MySQL; if this is intentional, consider making the behavior explicit in configuration or code comments to avoid confusion about missing migrated data.

## Individual Comments

### Comment 1
<location> `astrbot/core/db/mysql.py:103-106` </location>
<code_context>
+        async with self.AsyncSessionLocal() as session:
+            yield session
+
+    async def initialize(self) -> None:
+        """Initialize the database by creating tables if they do not exist."""
+        async with self.engine.begin() as conn:
+            await conn.run_sync(SQLModel.metadata.create_all)
+            await conn.commit()
+
</code_context>

<issue_to_address>
**issue (bug_risk):** Avoid calling `conn.commit()` inside `engine.begin()`; the context manager already manages the transaction.

Within `async with self.engine.begin() as conn:`, SQLAlchemy opens a transaction and commits or rolls back on exit. The extra `await conn.commit()` is unnecessary and can cause errors in some configurations. Please rely on the context manager’s automatic commit instead.
</issue_to_address>

### Comment 2
<location> `astrbot/core/db/mysql.py:840` </location>
<code_context>
+            result = await session.execute(query)
+            return list(result.scalars().all())
+
+    async def update_platform_session(
+        self,
+        session_id: str,
+        display_name: str | None = None,
+    ) -> None:
+        """Update a Platform session's updated_at timestamp and optionally display_name."""
+        async with self.get_db() as session:
+            session: AsyncSession
+            async with session.begin():
+                values: dict[str, T.Any] = {"updated_at": datetime.now(timezone.utc)}
+                if display_name is not None:
+                    values["display_name"] = display_name
+
+                await session.execute(
+                    update(PlatformSession)
+                    .where(col(PlatformSession.session_id) == session_id)
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Align timestamp timezone handling for `updated_at` with the rest of the model to avoid mixed aware/naive datetimes.

Other timestamps in this module (e.g., stats and message history) use naive `datetime.now()`, so this is the only UTC-aware field. Depending on how `PlatformSession.updated_at` is defined, this mismatch can cause warnings or comparison issues. Please choose a single convention (UTC-aware or naive) and apply it consistently across the DB layer.

```suggestion
                values: dict[str, T.Any] = {"updated_at": datetime.now()}
```
</issue_to_address>

### Comment 3
<location> `astrbot/core/db/migration/migra_webchat_session.py:41` </location>
<code_context>
                 select(
                     col(PlatformMessageHistory.user_id),
-                    col(PlatformMessageHistory.sender_name),
+                    func.max(PlatformMessageHistory.sender_name).label("sender_name"),
                     func.min(PlatformMessageHistory.created_at).label("earliest"),
                     func.max(PlatformMessageHistory.updated_at).label("latest"),
</code_context>

<issue_to_address>
**question (bug_risk):** Aggregating `sender_name` with `MAX` fixes SQL grouping but produces an arbitrary name; confirm this matches the intended semantics.

`func.max` satisfies `ONLY_FULL_GROUP_BY` but picks the lexicographically greatest name, which is essentially arbitrary within the group. If the chosen `sender_name` has semantic or UI significance, consider deriving it from a specific row (e.g., earliest or latest message) rather than relying on `MAX`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

async with self.get_db() as session:
session: AsyncSession
async with session.begin():
values: dict[str, T.Any] = {"updated_at": datetime.now(timezone.utc)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): 请让 updated_at 的时间戳时区处理方式与模型的其他字段保持一致,以避免混用“有时区/无时区”的时间对象。

该模块中其他时间戳(例如统计信息和消息历史)使用的是无时区的 datetime.now(),因此这里是唯一一个使用 UTC 有时区时间的字段。根据 PlatformSession.updated_at 的定义方式不同,这种不一致可能会导致警告或比较问题。请在整个数据库层选择一种统一的约定(统一使用 UTC 有时区,或统一使用无时区),并一致地应用。

Suggested change
values: dict[str, T.Any] = {"updated_at": datetime.now(timezone.utc)}
values: dict[str, T.Any] = {"updated_at": datetime.now()}
Original comment in English

suggestion (bug_risk): Align timestamp timezone handling for updated_at with the rest of the model to avoid mixed aware/naive datetimes.

Other timestamps in this module (e.g., stats and message history) use naive datetime.now(), so this is the only UTC-aware field. Depending on how PlatformSession.updated_at is defined, this mismatch can cause warnings or comparison issues. Please choose a single convention (UTC-aware or naive) and apply it consistently across the DB layer.

Suggested change
values: dict[str, T.Any] = {"updated_at": datetime.now(timezone.utc)}
values: dict[str, T.Any] = {"updated_at": datetime.now()}

select(
col(PlatformMessageHistory.user_id),
col(PlatformMessageHistory.sender_name),
func.max(PlatformMessageHistory.sender_name).label("sender_name"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (bug_risk): 使用 MAX 来聚合 sender_name 能解决 SQL 分组问题,但会产生一个任意的名字;请确认这是否符合预期语义。

func.max 可以满足 ONLY_FULL_GROUP_BY 的要求,但它会选择字典序中最大的那个名字,在一个分组内这基本上是任意的。如果被选中的 sender_name 在语义或 UI 上有意义,建议考虑基于特定行(例如最早消息或最新消息)来派生,而不是依赖 MAX

Original comment in English

question (bug_risk): Aggregating sender_name with MAX fixes SQL grouping but produces an arbitrary name; confirm this matches the intended semantics.

func.max satisfies ONLY_FULL_GROUP_BY but picks the lexicographically greatest name, which is essentially arbitrary within the group. If the chosen sender_name has semantic or UI significance, consider deriving it from a specific row (e.g., earliest or latest message) rather than relying on MAX.

@LIghtJUNction
Copy link
Member

干脆用pg数据库吧

@Soulter
Copy link
Member Author

Soulter commented Dec 4, 2025

干脆用pg数据库吧

都可以支持

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]把默认的sqlite改为可选项

3 participants